library(terra)
library(sf)
library(tmap)
library(dplyr)
library(tmaptools)
library(twsagps)Creating software for environmental measurements based on GPS data - twsagps package report
Introduction
This report comprehends the present state of twsagps package - current issues and further development plans. Package consists of four functions PO_exposure(), KDE_exposure(), DR_exposure() and LS_exposure() each utilizing one Time Weighted Spatial Averaging (TWSA) method. The TWSA methods are taken from the reference paper (Jankowska et al. 2023) on which the package is based upon - Point Overlay (PO), Kernel Density Estimation (KDE), Density Ranking (DR) with the addition of Line Segment (LS). Report demonstrates code workflow and outcomes.
For this demo following packages were used:
Research area and sample data
For presenting package functioning Geolife GPS Trajectories 1.3 Data (Zheng et al. 2009), (Zheng et al. 2008), (Yu Zheng 2010) was used as sample GPS data. The Geolife dataset consists of significant volumes of time-stamped GPS data retrieved from 182 pariticipants in over five year time span. Research area was limited to San Diego county and GPS data was restricted to the research area. To improve running time of functions Geolife data was reduced to each 10th record. The GPS data was collected between 17th August 2011 and 20th August 2011.
geol_file = read.csv("geolife_san_diego.csv")
geolife_days = geol_file |> select(-X) |>
slice(which(row_number() %% 10 == 1))
geolife_time = geolife_days |>
mutate(dateTime = paste(date, time),
dateTime = as.POSIXct(dateTime, format = "%Y-%m-%d %H:%M:%S", tz = "UTC"))| lat | lon | date | time | dateTime |
|---|---|---|---|---|
| 33.26486 | -117.4323 | 2011-08-17 | 21:38:11 | 2011-08-17 21:38:11 |
| 33.26239 | -117.4301 | 2011-08-17 | 21:38:21 | 2011-08-17 21:38:21 |
| 33.25969 | -117.4280 | 2011-08-17 | 21:38:31 | 2011-08-17 21:38:31 |
| 33.25700 | -117.4259 | 2011-08-17 | 21:38:41 | 2011-08-17 21:38:41 |
| 33.25437 | -117.4238 | 2011-08-17 | 21:38:51 | 2011-08-17 21:38:51 |
| 33.25162 | -117.4216 | 2011-08-17 | 21:39:01 | 2011-08-17 21:39:01 |
NDVI raster data was applied as sample environmental layer. Indicator was calculated from a Landsat 8 image involving San Diego county which was captured on 22nd Februrary 2024.
landsat_red = rast("landsat_ndvi/LC09_L2SP_040037_20240222_20240224_02_T1_SR_B4.TIF")
landsat_nir = rast("landsat_ndvi/LC09_L2SP_040037_20240222_20240224_02_T1_SR_B5.TIF")
landsat_ndvi = (landsat_nir - landsat_red) / (landsat_nir + landsat_red)landsat_red = rast("landsat_ndvi/LC09_L2SP_040037_20240222_20240224_02_T1_SR_B4.TIF")
landsat_nir = rast("landsat_ndvi/LC09_L2SP_040037_20240222_20240224_02_T1_SR_B5.TIF")
landsat_ndvi = (landsat_nir - landsat_red) / (landsat_nir + landsat_red)
Methods
Point Overlay (PO)
Point Overlay method is implemented in PO_exposure() function. Evaluated by converting GPS point vector data to raster data type using sum function. …
Kernel Density Estimation (KDE)
KDE_exposure() utilizes the KDE method. Method generates smoothed buffers aroud GPS data considering differences in spatial distribution and influence. Function calculates exposure from GPS points coordinates on matrix of coordinates using kde() from TDA package. …
Density Ranking (DR)
Density Ranking method in DR_exposure() is based on DR function from yenchic/density_ranking GitHub repository[Chen and Dobra (2020)](Chen 2019) and implemented as DR_simple() adapted to processing data in projected coordinates systems. DR serves as a improvement to KDE method by ranking KDE values (Jankowska et al. 2023). …
Line Segment (LS)
New addition in TWSA approaches is a Line Segment method applied in LS_exposure() drawn from michaeldgarber/microclim-static-v-dynam GitHub repository. Code applied in LS_exposure() consists of modified functions from the repository’s demo. Line Segment puts emphasis on time component of GPS data. Firstly, trajectories following GPS movement and buffers around each trajectory segment are created. Secondly, weighted values for each segment are extracted from environmental layer and combined with time elapsed between every two GPS records. Obtained polygons are converted to raster with sum function afterwards.
Processing
Parameters
User may provide various arguments to functions in twsagps package
x - data frame of GPS data with lon/lat columns
cellsize - size of cells in output raster
normalize - if TRUE activity space raster values will be rescaled to 0-1 range
bandwidth - bandwidth (DR/KDE) size of buffer in meters (LS)
time_data - column name containing time data of GPS data (LS)
time_unit - unit of time weights (LS)
stats - output statistics calculated on output raster
start_crs - coordinate reference system of GPS data
end_crs - coordinate reference system of output layer
env_data - environmental data raster
x = geolife_time
cellsize = 50
normalize = TRUE
bandwidth = 200 # in meters
time_data = "dateTime"
time_unit = "mins"
stats = c("count", "area", "min", "max", "range", "mean", "std", 'sum')
start_crs = "WGS84"
end_crs = crs(landsat_ndvi)
env_data = landsat_ndvi
# KDE and DR bandwidths discussed in Package Issues
bandwidth_kde = 25
bandwidth_dr = 70Entry processing
Entry processing of GPS data and environmental data shared by all functions. Transforming GPS data to spatial data object and projecting GPS data and environemtal raster to common CRS.
# get spatial data
geol_points = terra::vect(x = geolife_time, geom = c("lon", "lat"), crs = start_crs)
# crs
if (!is.null(end_crs)){
geol_proj = terra::project(geol_points, terra::crs(end_crs))
} else if (!is.null(env_data)) {
geol_proj = terra::project(geol_points, terra::crs(env_data))
} else {
geol_proj = geol_points
}
if (!is.null(env_data)){ # change env_data crs beforehand
env_data_proj = terra::project(env_data, terra::crs(geol_proj))
}Extent
Calculating extented extent for DR/KDE/LS methods for the purpose of preventing cropping cells with values. DR and KDE extents are expanded by constant 5% of former boundary box. Preceding computing of new extent in LS trajectories and buffers are processed. Subsequantly, the extent is based on buffer vector layer boundary box.
### DR/KDE ###
buff_const = 0.05 #percent of extent as bbox
# get extent
geol_extent = terra::ext(geol_proj)
# buffer around extent
x_const_dr = (terra::xmax(geol_extent) - terra::xmin(geol_extent)) * buff_const
y_const_dr = (terra::ymax(geol_extent) - terra::ymin(geol_extent)) * buff_const
# new extent
new_extent = c(terra::xmin(geol_extent) - x_const_dr,
terra::xmax(geol_extent) + x_const_dr,
terra::ymin(geol_extent) - y_const_dr,
terra::ymax(geol_extent) + y_const_dr)Code
### LS function ###
trajectories_fun = function(data){
trajectories_out = data |>
sf::st_as_sf() |> # change to sf object to change points to linestring later
#filter to the study id defined by the argument
#define start and end points of line
dplyr::mutate(
line_id = dplyr::row_number(),#an id for each "line segment"
x_start= sf::st_coordinates(geometry)[,1],
y_start= sf::st_coordinates(geometry)[,2],
x_end = dplyr::lead(x_start),
y_end = dplyr::lead(y_start)
) |>
dplyr::ungroup() |>
sf::st_set_geometry(NULL) |>
#exclude the last observation, which has no "lead", and will be missing.
dplyr::filter(is.na(x_end)==FALSE) |>
# Make the data long form so that each point has two observations
tidyr::pivot_longer(
# select variables to pivot longer.
cols = c(x_start, y_start, x_end, y_end),
#value goes to "x/y", and time goes to "_start/end"
names_to = c(".value", "time"),
names_repair = "unique",
names_sep = "_"#the separator for the column name
) |>
# create sf object once again
sf::st_as_sf(coords = c("x", "y"), crs= sf::st_crs(data)) |>
dplyr::group_by(line_id) |>
dplyr::summarize(do_union = FALSE) |>
sf::st_cast("LINESTRING") |> # cast linestring type
sf::st_as_sf() |>
dplyr::ungroup()
return(trajectories_out)
}### LS ###
# create line segments from spatial points
trajectories = terra::vect(trajectories_fun(geol_proj))
# create buffers over line segments
traj_buff = trajectories |>
tidyterra::select(line_id) |>
terra::buffer(bandwidth) # takes a lot of time
#
extent_buff = terra::ext(traj_buff)tmap mode set to interactive viewing
Line Segment buffers with trajectories
Grid Raster
Compute Grid Raster utilized in calculating output raster. Grid consists of previously specified extent, cellsize argument or environmental data raster.
## PO
extent = terra::ext(geol_proj)
## KDE/DR
# extent = new_extent
## LS
# extent = extent_buff
# implement cellsize > 0 condition
if (is.numeric(cellsize)) { # cellsize included
if (terra::linearUnits(geol_proj) == 0){ # crs units in degrees
dist_lon = geosphere::distm(c(extent[1], extent[3]), c(extent[2], extent[3]),
fun = geosphere::distHaversine)
dist_lat = geosphere::distm(c(extent[1], extent[3]), c(extent[1], extent[4]),
fun = geosphere::distHaversine)
# number of cells
x_cells = (dist_lon / cellsize) |> as.integer()
y_cells = (dist_lat / cellsize) |> as.integer()
# empty_rast for units in degrees
grid_rast = terra::rast(crs = terra::crs(geol_proj), nrows=y_cells,
ncols=x_cells, extent = extent)
} else {
grid_rast = terra::rast(crs = terra::crs(geol_proj), extent = extent,
resolution = cellsize)
}
} else if (!is.null(env_data)){ #if incorrect cellsize and env_data exists
grid_rast = env_data_proj
terra::ext(grid_rast) = extent
}Activity Space
Point Overlay
Activity space of Point Overlay method is evaluated by converting points vector data to raster using sum function.
geol_proj$rast = 1
rast_points = terra::rasterize(geol_proj, grid_rast, field = "rast", fun = sum)Kernel Density Eestimation/Density Ranking
KDE and DR functions implemented in KDE_exposure() and DR_exposure() require data frames of GPS points coordinates and coordinates of every cell of Grid Raster (KDE) or properties of the Grid Raster (DR) as inputs.
### KDE ###
# number of cells
x_cells = terra::ncol(grid_rast)
y_cells = terra::nrow(grid_rast)
# coordinate seq
x_seq = seq(new_extent[1], new_extent[2], length.out = x_cells)
y_seq = seq(new_extent[3], new_extent[4], length.out = y_cells)
# coords of every cell
expand_grid = expand.grid(x_seq,y_seq)
# point coords
coords = terra::geom(geol_proj)[,3:4]
# kde functions from TDA package
kde_data = TDA::kde(coords, Grid = expand_grid, h = sqrt(bandwidth_kde))
kde_rast = grid_rast
# insert kde values to raster
terra::values(kde_rast) = kde_data
kde_rast = terra::flip(kde_rast, direction='vertical') # flip raster### DR ###
# point coords
coords = terra::geom(geol_proj)[,3:4]
# number of cells
x_cells = terra::ncol(grid_rast)
y_cells = terra::nrow(grid_rast)
# calculate DR data
dr_data = DR_simple(coords, kernel= "Gaussian", h = bandwidth_dr,
x_res=x_cells, y_res=y_cells, xlim=new_extent[1:2],
ylim=new_extent[3:4])
dr_rast = grid_rast
# insert dr values to raster
terra::values(dr_rast) = dr_data$gr_alpha
dr_rast = terra::flip(dr_rast, direction='vertical') # flip rasterActivity Space Rasters of PO, KDE and DR methods
Normalization
Rescale activity space raster’s values to 0-1 range.
### PO ###
fin_rast = rast_points
if (normalize == TRUE){
fin_rast = terra::subst(fin_rast, from = NA, to = 0) # proper range for normalization
rast_minmax = terra::minmax(fin_rast) # minmax
# calculate normalization to 0-1 range
fin_rast = (rast_points - rast_minmax[1,])/(rast_minmax[2,]-rast_minmax[1,])
fin_rast = terra::subst(fin_rast, from = 0, to = NA) # insert NA
}### KDE/DR ###
## KDE
fin_rast = kde_rast
## DR
# fin_rast = dr_rast
if (normalize == TRUE){ # normalize values
rast_minmax = terra::minmax(fin_rast) # minmax
# calculate normalization to 0-1 range
fin_rast = (fin_rast - rast_minmax[1,])/(rast_minmax[2,]-rast_minmax[1,])
}
fin_rast = terra::subst(fin_rast, from = 0, to = NA)Normalized Activity Space for PO, KDE and DR methods
Environmental exposure
Point Overlay/Kernel Density Estimation/Density Ranking
Environmental exposure layer is computed by multiplying activity space layer and environmental data layer.
if (!is.null(env_data)){ # calculate exposure
env_data_resamp = terra::resample(env_data_proj, fin_rast)
rast_env = fin_rast * env_data_resamp
}Line Segment
Computing environmental exposure layer in Line Segment method requires assigning environmental data raster values to Grid Raster. Creating activity space demands allocating all Grid Raster values to 1.
if (!is.null(env_data)){ # env_data included
# replace vals of grid with env values
env_resamp = terra::resample(env_data_proj, grid_rast)
terra::values(grid_rast) = terra::values(env_resamp)
} else {
terra::values(grid_rast) = 1
}Consecutively, Grid Raster values are extracted to trajectory buffer and later on summarized by area overlap for each segment. Time elapsed between every two GPS data points are calculated in order to take cognistance of time weights. Finally, vector data is converted to raster based on exposure and time weights with sum function.
# extract values and weights (area overlap) from grid for each cell of buffer
traj_extract= grid_rast |>
terra::extract(traj_buff, na.rm=TRUE, weights = TRUE) |>
dplyr::as_tibble() |>
dplyr::rename(
line_id = ID,#rename this to line id
e=2#second column is the exposure.
)
# calculate summarized exposure for each buffer
traj_extract_line_id = traj_extract |>
dplyr::group_by(line_id) |>
dplyr::summarize(
#Jan 9, 2024 use R's built-in weighted.mean() function
#instead of calculating weighted average manually
e=stats::weighted.mean(
x=e,
w=weight,
na.rm=T),
#These weights are based on the areal overlap, not time
sum_weights = sum(weight,na.rm=T),
n_pixel = dplyr::n() # number of observations corresponds to number of pixels per line segment
) |>
dplyr::ungroup()
# check if time_data column provided
if (!(is.null(geol_proj[time_data]))){
duration_line_id = geolife_time |>
dplyr::mutate(time_elapsed = as.numeric(difftime(dplyr::lead(.data[[time_data]]),
.data[[time_data]],
units = time_unit)),
line_id = dplyr::row_number()) |>
dplyr::select(line_id, time_elapsed)
traj_extract_line_id = traj_extract_line_id |>
#now link in the time weight
dplyr::left_join(duration_line_id,by=c("line_id")) |>
dplyr::mutate(
#calculate a weight that considers both area overlap and time
end_weights = e * time_elapsed
)
} else {
traj_extract_line_id = traj_extract_line_id |>
dplyr::mutate(end_weights = e) # end result is exposure without time
}
# join weights with spatial buffer
weight_buff = dplyr::inner_join(traj_buff, traj_extract_line_id, by = 'line_id')
# rasterize results
fin_rast = terra::rasterize(weight_buff, grid_rast, field = "end_weights", fun = sum)| line_id | e | weight |
|---|---|---|
| 1 | 0.1510287 | 0.2776131 |
| 1 | 0.3074365 | 0.8561898 |
| 1 | 0.2196987 | 1.0000000 |
| 1 | 0.2476736 | 1.0000000 |
| 1 | 0.2691458 | 0.8558192 |
| 1 | 0.2917254 | 0.2783543 |
| 1 | 0.0974029 | 0.2394366 |
| 1 | 0.0545786 | 0.9831357 |
| 1 | 0.1115811 | 1.0000000 |
| 1 | 0.2017791 | 1.0000000 |
| line_id | e | sum_weights | n_pixel | time_elapsed | end_weights |
|---|---|---|---|---|---|
| 1 | 0.1818616 | 104.5206 | 127 | 0.1666667 | 0.0303103 |
| 2 | 0.1825105 | 107.2934 | 132 | 0.1666667 | 0.0304184 |
| 3 | 0.1844809 | 107.2161 | 135 | 0.1666667 | 0.0307468 |
| 4 | 0.1678029 | 105.7779 | 135 | 0.1666667 | 0.0279672 |
| 5 | 0.1756641 | 109.1288 | 137 | 0.1666667 | 0.0292774 |
| 6 | 0.2133046 | 111.2917 | 140 | 0.1666667 | 0.0355508 |
| 7 | 0.1915251 | 109.1753 | 137 | 0.1666667 | 0.0319208 |
| 8 | 0.1610424 | 110.9446 | 137 | 0.1666667 | 0.0268404 |
| 9 | 0.1730918 | 110.1436 | 137 | 0.1666667 | 0.0288486 |
| 10 | 0.1926430 | 103.0510 | 127 | 0.1666667 | 0.0321072 |
Weighted line segments buffers
Raster statistics
Statistics are calculated utilizing terra::global() function.
Code
# Output raster and statistics calculation
output_calc = function(rast, stats = NULL){
output = list()
output$rast = rast
if (!is.null(stats)){ # activity stats
raster_stats = rast_stats(rast, stats)
output$stats = raster_stats
}
return(output)
}
# raster statistics
rast_stats = function(raster, stats){
vals = data.frame(matrix(ncol = length(stats)))
colnames(vals) = stats
for (statistic in stats){
if (statistic == "range"){
range = terra::global(raster, fun = statistic, na.rm = TRUE)
value = range[,2] - range[,1]
} else if (statistic == "count"){
value = raster |> terra::freq() |> dplyr::summarise(n = sum(count)) |>
as.integer()
} else if (statistic == "area") {
value = raster |> terra::expanse() |> dplyr::select(area) |> as.integer()
} else {
value = terra::global(raster, fun = statistic, na.rm = TRUE) |> as.numeric()
}
vals[statistic] = value
}
return(vals)
}if (!is.null(env_data)){ # env_data output
output = output_calc(rast_env, stats = stats)
} else {
# calculate activity output
output = output_calc(fin_rast, stats = stats)
}Output
Output of twsagps package functions is a list consisting of enviromental exposure raster and statistics data frame.
Environmental exposure layers for all methods
| Method | count | area | min | max | range | mean | std | sum |
|---|---|---|---|---|---|---|---|---|
| PO | 1089 | 2724647 | -0.0162026 | 0.0424337 | 0.0586363 | 0.0042731 | 0.0043587 | 4.653353 |
| KDE | 18551 | 46413985 | -0.0017587 | 0.0537845 | 0.0555432 | 0.0001080 | 0.0013325 | 2.003008 |
| DR | 4517 | 11301403 | -0.0572428 | 0.1680100 | 0.2252528 | 0.0200316 | 0.0203706 | 90.482557 |
| LS | 20904 | 52301083 | -0.0810245 | 527.6466675 | 527.7276920 | 2.5270283 | 26.2162668 | 52824.999407 |
Package issues
Terra package
R is unable to save terra spatial objects as variables in environment. Only way to use terra objects after reopeing the R session is to use terra::wrap() and terra::unwrap() functions.
output_wrap = wrap(output$rast)
output$rast = unwrap(output_wrap)Several functions from the package are time-consuming and ability to reuse them in R session would be beneficial. Should we consider converting to other package or entrust the matter of saving variables to the user?
Line Segment normalization
Due to different workflow of Line Segment method (environmental exposure layer created by extracting cell values from grid raster [ Figure 1] , not like in other methods by simply multiplying activity raster by environmental layer) applying activity raster normalization is problematic and cannot be accomplished solely by rescaling activity raster values.
Implementing normalization to end weights attribute is pointless as rasterization with sum function exceeds values out of 0-1 range. Additionally, normalized activity space raster has no practical value as exposure is calculated before rasterization. Is applying normalization to Line Segment method viable without altering the method itself?
KDE/DR smoothing parameter - TDA package
DR and KDE methods utilize kde() R function from TDA package which’s bandwidth is a smoothing parameter. Bandwidth should correspond to kernel search radius in meters but in this package results differ. It is presumably induced by distinct math formulas used in reference paper (Jankowska et al. 2023) and in TDA::kde() function. In the paper formula (Silverman 1986) h is defined as size of kernel (Jankowska et al. 2023).
P(x,y) = \frac{1}{nh^2} \sum^n_{i=1} K(\begin{array} {c} d_i(x,y) \\ h \end{array})
In package formula (Wasserman 2004) h is a smoothing parameter which is additionally squared.
pX(x) = \frac{1}{n(\sqrt{2\pi} h)^d} \sum_{i=1} ^ n \exp(\frac{- \Vert x - X_i \Vert_2^2}{2h^2})
Following the advice of the author of Density Ranking function, Yen-Chi Chen, smoothing parameter of twsagps::KDE_exposure() and code previously discussed is square rooted in order to compensate for varying formulas. h parameter modifications do not nullify spatial inconsistencies. Comparison of TDA::kde() kernel search radius and adjusted dissolved buffer presents spatial inaccuracy of the function.

For the purpose of comparison alternative kde function from SpatialKDE package has been utilized in twsagps::KDE_slow(). Function’s outputs are suitable and kernel search radius is operating as intented but running time is too excessive to substitute the TDA::kde(). Comparing kde() functions remarkably low values of TDA and spatial differences are noticeable.
spatial_kde = SpatialKDE::kde(sf::st_as_sf(geol_proj), band_width = bandwidth,
grid = raster::raster(grid_rast))
spatial_kde = terra::rast(spatial_kde) TDA and SpatialKDE - activity space
TDA and SpatialKDE - normalized NDVI exposure
| Package | count | area | min | max | range | mean | std | sum |
|---|---|---|---|---|---|---|---|---|
| TDA | 18551 | 46413985 | -0.0017587 | 0.0537845 | 0.0555432 | 0.0001080 | 0.0013325 | 2.003008 |
| SpatialKDE | 19420 | 48588191 | -0.0070229 | 0.1343125 | 0.1413355 | 0.0014399 | 0.0037505 | 27.963379 |
DR function from density-ranking utilizes Empirical Cumulative Distribution Function on GPS data coordinates’ KDE and all cells coordinates’ KDE.
coords = terra::geom(geol_proj)[,3:4]
expand_grid = expand.grid(x_seq,y_seq)
D_kde = TDA::kde(X = coords, Grid = coords, h = bandwidth_dr)
grid_kde = TDA::kde(X = coords, Grid = expand_grid, h = bandwidth_dr)
DR_end = ecdf(D_kde)(grid_kde)Due to processing of data values abnormalities are partially mitigated and smoothing parameter is modified. As h parameter is defined in units of coordinates, bandwidths in degrees to corresponding search radiuses in meters (0.001 - 400 m, 0.007 - 200 m, 0.005 - 100 m). These may serve as a reference in evaluating smoothing parameter values in meters.

With the intention of using the bandwidth in meters in KDE and DR twsagps functions, method of transforming h parameter from distance unit to h applied in TDA::kde() math formula should be derived.
Further development
Implementing extent parameter that enables specifying research area would grant increased control over formation of grid raster and allow removal of incorrectly located data. Functions should handle sf bbox and terra SpatExtent class objects as well as vector data of 4 length (xmin, xmax, ymin, ymax).
Package should process vector data as their environmental data input as it would enable usage of sf and terra SpatVector objects. Additionally, user should have the opportunity to generate buffer around the vector data to embody spatial influence of enviromental objects.
User ought to be avaible to insert spatial data in form different from data frame. Implementing handling spatial class object of terra SpatVector and sf points would broaden variety of accesible data.
Applying iteration over data groups would allow to derive distinct seperate IDs, types of mobility and dates in seperate outputs.